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A  JDBC  Driver  Supporting  Data  Source  Integration  and  Evolution 


Statement  of  the  Problem  Studied 


The  problem  studied  is  how  to  rapidly  integrate  information  from  multiple  data  sourees.  Current  approaehes 
perform  integration  [HalevyOl]  by  building  a  global  view  and  then  mapping  queries  on  the  global  view  to  the 
data  sourees.  Building  a  global  view  is  still  performed  using  manual  teehniques  [Batini86].  Thus,  integration  is 
costly  and  time-consuming  because  building  a  global  view  is  a  bottleneck  in  the  process.  Further,  although 
many  integration  systems  and  prototypes  have  been  developed  [Goh99,Kirk95,Li98],  none  have  remained  as 
viable,  usable  produets.  The  reason  for  this  is  that  they  were  built  using  proprietary  teehnology  and  require 
expertise  out  of  the  realm  of  most  developers  and  users. 

The  goal  of  this  researeh  is  to  demonstrate  that  praetieal,  rapid  information  fusion  ean  be  aehieved  by: 

■  Building  an  integration  arehiteeture  using  eommon  industrial  standards  sueh  as  Java  and  Java  DataBase 
Conneetivity  (JDBC). 

■  Developing  a  system  for  doeumenting  data  souree  eontents  so  that  they  ean  be  rapidly  shared  and  integrated. 

■  Defining  a  high-level  query  language  that  allows  users  to  speeify  the  eoneepts  they  want  without  indieating 
how  to  retrieve  them.  The  language  must  hide  integration  details  and  be  easier  to  use  than  SQL. 

■  Supporting  "real  data"  by  handling  data  ineonsisteney,  ineompleteness,  and  outlying  data  sueh  that  data 
mining  and  deeision  support  systems  ean  identify  meaningful  outlying  data  that  is  not  filtered  out  by  the 
integration  system. 

The  researeh  produet  of  this  projeet  is  a  JDBC  driver  that  allows  for  Java  programs  to  transparently  query  data 
sourees  without  speeifying  struetural  queries  (SQL).  Programs  and  users  speeify  semantie  queries  to  the  JDBC 
driver  that  translates  the  high-level  queries  into  SQL  queries  for  the  appropriate  data  sourees.  This  automatie 
translation  proeess  performed  by  the  driver  isolates  users  and  applieations  from  the  eomplexity  of  multiple  data 
souree  querying  and  allows  the  applieations  to  fimotion  in  the  presenee  of  sehema  evolution  of  the  underlying 
data  sourees. 


Summary  of  Important  Results 

The  major  produet  of  this  researeh  is  a  JDBC  driver  (see  Figure  1)  eapable  of  integrating  multiple  data  sourees. 
The  JDBC  driver  proeesses  high-level  user  queries  and  eonverts  them  to  queries  on  multiple  databases. 
Information  from  the  multiple  databases  is  fused  together  and  presented  to  the  user.  A  unique  feature  of  the 
driver  is  that  it  supports  data  ineonsisteney.  Information  that  is  ineonsistent  aeross  the  databases  is  highlighted 
and  ean  be  used  to  determine  data  deserving  further  investigation. 

This  proof  of  eoneept  implementation  demonstrates  that  it  is  feasible  to  integrate  databases  using  a  JDBC 
driver.  Using  the  driver,  Java  applieations  ean  be  rapidly  developed  that  extraet  information  from  multiple 
sourees.  Sinee  the  query  language  does  not  foree  the  user  to  referenee  partieular  databases,  tables,  or  fields, 
developing  an  applieation  that  aeeess  multiple  databases  is  no  more  eomplex  than  developing  an  applieation 
that  aeeesses  a  single  database.  Further,  the  system  supports  data  souree  evolution  as  the  mapping  proeess 
performed  inside  the  driver  allows  the  databases  queried  to  ehange  without  affeeting  user  queries. 


Figure  1,  Unity  JDBC  Driver  Architecture 


The  major  important  results  are; 

■  A  JDBC  driver  implementation  based  on  the  Unity  architecture  [LawrenceOl],  called  Unity  Driver,  that 
supports  multiple  database  querying. 

■  Demonstration  of  how  UnityDriver  can  be  used  as  platform  for  developing  applications  for  performing  data 
mining  and  information  fusion. 

■  A  high-level  query  language  allowing  users  to  easily  query  multiple  databases. 

■  A  mapping  algorithm  for  converting  high-level  queries  into  SQL  and  integrating  results  returned. 

There  are  two  keys  to  the  success  of  the  integration.  First,  databases  are  annotated  with  more  information  so 
that  the  concepts  in  them  can  be  more  rapidly  compared.  One  of  the  keys  to  successful  integration  is  assigning 
meaningful  names,  so  that  users  can  query  on  familiar  names  rather  than  obscure  system  names.  The  second 
key  is  the  ability  to  automatic  insert  local  and  global  joins  in  a  user  query.  A  user  may  request  the  system; 
“return  all  soldiers  who  have  chemical  training  and  are  currently  stationed  in  Iraq”.  The  system  would 
determine  where  those  concepts  are  in  the  underlying  databases  and  how  to  combine  joins  within  and  across 
databases  to  answer  the  user  query  without  the  user’s  involvement.  The  algorithms  for  automatic  join 
determination  are  unique  to  this  work  and  will  be  the  subject  of  future  publications. 

Publications  and  Reports 

The  JDBC  driver  implementation  can  be  downloaded  at  http;//idealab3. cs.uiowa.edu.  Included  is 
documentation  on  how  to  use  the  driver  and  sample  programs.  The  driver  was  tested  on  integration  problems. 
The  test  programs  can  be  used  over  the  Internet  and  are  available  at  http;//idealab3. cs.uiowa.edu. 

A  description  of  the  JDBC  driver  implementation  will  be  made  available  in  a  University  of  Iowa  technical 
report  and  will  be  submitted  for  publication  in  2004. 
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